Training a Log-Linear Parser with Loss Functions via Softmax-Margin
نویسندگان
چکیده
Log-linear parsing models are often trained by optimizing likelihood, but we would prefer to optimise for a task-specific metric like Fmeasure. Softmax-margin is a convex objective for such models that minimises a bound on expected risk for a given loss function, but its naı̈ve application requires the loss to decompose over the predicted structure, which is not true of F-measure. We use softmaxmargin to optimise a log-linear CCG parser for a variety of loss functions, and demonstrate a novel dynamic programming algorithm that enables us to use it with F-measure, leading to substantial gains in accuracy on CCGBank. When we embed our loss-trained parser into a larger model that includes supertagging features incorporated via belief propagation, we obtain further improvements and achieve a labelled/unlabelled dependency F-measure of 89.3%/94.0% on gold part-of-speech tags, and 87.2%/92.8% on automatic part-of-speech tags, the best reported results for this task.
منابع مشابه
Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions
We describe a method of incorporating taskspecific cost functions into standard conditional log-likelihood (CLL) training of linear structured prediction models. Recently introduced in the speech recognition community, we describe the method generally for structured models, highlight connections to CLL and max-margin learning for structured prediction (Taskar et al., 2003), and show that the me...
متن کاملSoftmax-Margin Training for Structured Log-Linear Models
We describe a method of incorporating task-specific cost functions into standard conditional log-likelihood (CLL) training of linear structured prediction models. Recently introduced in the speech recognition community, we describe the method generally for structured models, highlight connections to CLL and maxmargin learning for structured prediction (Taskar et al., 2003), and show that the me...
متن کاملLog-Linear RNNs: Towards Recurrent Neural Networks with Flexible Prior Knowledge
We introduce LL-RNNs (Log-Linear RNNs), an extension of Recurrent Neural Networks that replaces the softmax output layer by a log-linear output layer, of which the softmax is a special case. This conceptually simple move has two main advantages. First, it allows the learner to combat training data sparsity by allowing it to model words (or more generally, output symbols) as complex combinations...
متن کاملPenalty function maximization for large margin HMM training
We perform large margin training of HMM acoustic parameters by maximizing a penalty function which combines two terms. The first term is a scale which gets multiplied with the Hamming distance between HMM state sequences to form a multi-label (or sequence) margin. The second term arises from constraints on the training data that the joint log-likelihoods of acoustic and correct word sequences e...
متن کاملThe Z-loss: a shift and scale invariant classification loss belonging to the Spherical Family
Despite being the standard loss function to train multi-class neural networks, the log-softmax has two potential limitations. First, it involves computations that scale linearly with the number of output classes, which can restrict the size of problems that we are able to tackle with current hardware. Second, it remains unclear how close it matches the task loss such as the top-k error rate or ...
متن کامل